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Abstract.  Ontologies  are  often  used  to  annotate  information  (metadata)  that  is  passed  between  domains 
during  negotiation.  In  that  sense.  Ontology  matching  is  critical  for  the  receiving  domain  to  gather  the  correct 
meaning  of  the  data,  and  hence  critical  for  interoperability.  Many  Ontology  matching  algorithms  have  been 
proposed  in  the  literature  but  in  general  they  all  assume  that  there  is  a  considerable  amount  of  knowledge 
about  both  ontologies  (sender  and  recipient).  This  assumption  is  not  true  in  many  cases.  In  this  paper,  we 
present  an  approach  that  does  not  require  such  assumption,  allowing  the  parts  to  keep  a  considerable  amount 
of  secrecy  on  their  Ontology  while  still  providing  the  required  matching  functionality. 
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1  Introduction 

Ontologies  are  often  used  to  annotate  information  (metadata)  that  is  passed  between  domains  during 
negotiation.  In  that  sense,  Ontology  matehing  is  eritieal  for  the  reeeiving  domain  to  gather  the  eorreet  meaning 
of  the  data,  and  henee  eritieal  for  interoperability.  Many  Ontology  matehing  algorithms  have  been  proposed  in 
the  literature  but  in  general  they  all  assume  that  there  is  a  eonsiderable  amount  of  knowledge  about  both 
ontologies  (sender  and  reeipient). 

The  manner  in  whieh  an  ontology  is  organized  ean  give  valuable  insights  on  the  organization’s  knowledge 
representation  and  the  importanee,  eomplexity  or  amount  of  data  expressed  in  this  knowledge  base.  This 
information  in  itself  is  very  valuable  and  the  assumption  that  negotiations  ean  happen  aeross  domain  boundaries 
with  full  diselosure  of  the  domain’s  ontologies  are  naive  at  best. 

In  this  paper,  we  present  an  approaeh  that  does  not  require  sueh  assumption,  allowing  the  parts  to  keep  a 
eonsiderable  amount  of  seereey  on  their  Ontology  while  still  providing  the  required  matehing  funetionality.  In 
faet,  we  want,  as  mueh  as  possible,  to  keep  the  upkeep  of  the  sending  and  reeeiving  ontologies  separated.  This 
ereates  an  extra  layer  of  eomplexity  for  Ontology  matehing  problem,  sinee  the  metadata  assoeiated  with  the 
information  must  be  eonverted  to  the  reeeiving  ontology  at  the  domain  boundary. 

The  ontology  matehing  proeess  aeross  domain  boundaries  has  some  extra  requirements  form  the  traditional 
aeademie  problem  that  makes  it  unique.  Some  of  these  key  issues  are: 

•  Multilingual/multieultural  One  important  issue  in  the  eross-domain  arena  is  that  the  Ontologies  to  be 
matehed  maybe  in  different  languages  (multi-national  negotiations),  henee  syntax  proximity  is  not  relevant. 

•  Independent  management/runtime  matehing  Another  important  issue  to  be  observed  is  the  ability  to 
handle  the  matehes  quiekly  at  runtime,  without  an  extensive  preparatory  effort,  thus  allowing  the  Ontologies  to 
be  managed  independently. 

•  Limited  information  exehange  In  the  ease  of  eross-domain,  the  partieipants  of  the  Ontology 
matehing  may  not  want  to  diselose  their  full  ontology,  but  only  the  neeessary  information  for  a  eorreet  matehing 
to  be  performed.  One  must  remember  that  the  need  for  the  ontology  matehing  if  often  not  a  translation,  but  only 
the  adjustment  of  the  metadata  and  the  eoherenee  and  eontinuity  of  its  properties. 

Although  the  existing  literature  does  not  direetly  apply  to  this  praetieal  extended  problem,  we  were  able  to 
find  relevant  work  that  we  believe  ean  be  adapted/enhaneed  to  work  in  our  domain. 
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1.1  Related  work 


Ontology  matching  is  the  process  of  finding  semantic  correspondence  between  similar  entities  of  different 
ontologies.  A  lot  of  work  has  addressed  the  problem  of  ontology  matching.  Here  we  describe  five  major 
matching  methods  that  have  been  reported  in  the  literature,  including  graph-based  matching,  linguistic 
matching,  hybrid  matching,  learning  based  matching  and  probabilistic  matching. 

1.  Graph-based  matching  or  structural  matching  uses  graphs  to  represent  ontologies  and  computes 
structural  similarities  of  graphs.  Examples  of  graph-based  matching  include  GMO  [1],  Anchor-Prompt  [2],  and 
Similarity  Flooding  [3].  GMO  is  an  iterative  structural  matcher,  which  uses  RDF  bipartite  graphs  to  represent 
ontologies  and  computes  structural  similarities  between  entities  by  recursively  propagating  their  similarities  in 
the  bipartite  graphs.  This  is  an  approach  that  we  possibly  can  exploit  and  hence  take  a  closer  look  at  it  in  section 
“Adjacency  Matrix-Based  Matching  Algorithm”  below.  Anchor-Prompt  is  an  ontology  merging  and  mapping 
tool,  which  treats  ontologies  as  directed  labeled  graphs.  The  basic  idea  is  that  if  two  pairs  of  entities  are  similar 
and  there  are  paths  connecting  them,  then  the  entities  in  these  paths  are  often  similar  as  well.  Similarity 
Flooding  is  a  graph  matcher  which  uses  fixpoint  computation  to  determine  corresponding  nodes  in  the  graphs. 
The  basic  idea  is  that  the  similarity  between  two  nodes  depends  on  the  similarity  between  their  adjacent  nodes, 
or  similarities  of  nodes  can  propagate  to  their  respective  neighbors. 

2.  Linguistic  matching  lies  in  the  construction  of  virtual  documents.  The  virtual  document  of  an  entity  in 
an  ontology  contains  the  local  descriptions  as  well  as  neighboring  information  that  contains  the  meaning  of  the 
entity.  Then  calculating  the  similarities  of  entities  translates  to  the  problem  of  calculating  document  similarities 
using  traditional  vector  space  techniques.  V-Doc  [4]  is  an  example  of  linguistic  matcher.  It  exploits  the  RDF 
graph  to  extract  the  description  information  from  three  sorts  of  neighboring  entities,  including  subject  neighbors, 
predicate  neighbors  and  object  neighbors. 

3.  Hybrid  matching  uses  linguistic  information  (e.g.,  name,  label,  and  description)  and  structural 
information  (e.g.,  key  properties,  taxonomic  structure)  to  find  correspondences  between  entities.  For  example, 
PROMPT  [5]  is  a  hybrid  matching  tool  for  user  oriented  ontology  merging.  To  make  the  initial  suggestions,  it 
uses  a  measure  of  linguistic  similarity  among  concept  names  and  mixes  it  with  the  structure  of  the  ontology  and 
user’s  actions.  For  each  operation,  it  finds  conflicts  that  the  operation  may  introduce  and  presents  new 
suggestions  to  the  user. 

4.  Learning  based  matching  is  efficient  when  instances  are  available  in  ontologies.  GLUE  [6]  is  an 
example  of  learning  based  matching  system.  It  first  applies  statistical  analysis  to  the  available  data  and  uses 
multiple  learners  to  exploit  information  in  concept  instances  and  taxonomic  structure  of  ontologies.  It  then  uses 
a  probabilistic  model  to  combine  results  of  different  learners.  Finally  it  adopts  relaxation  labeling  approach  to 
search  for  the  mapping  that  best  satisfies  the  domain  constraints  and  the  common  knowledge. 

5.  Probabilistic  matching  is  also  used  on  instance  level  in  ontology  matching.  For  example,  OMEN  [7]  is 
a  tool  which  describes  mappings  using  probabilities  and  infers  new  mappings  by  means  of  Bayesian  Network 
inference. 

The  rest  of  this  paper  is  organized  as  follows:  In  section2,  we  present  the  overall  approach  for  the  extended 
cross-domain  Ontology  matching  problem,  and  in  section  3  present  an  example  on  the  matching  methodology 
proposed.  Section  4  gives  more  details  on  how  to  successfully  implement  such  methodology.  Finally,  section  5 
discuses  our  conclusions  and  the  future  work  in  this  area. 
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2.  Overview  of  Approach 


In  a  collaborative  environment,  every  participant  has  their  own  priorities  and  perspectives  of  their  reality — 
they  each  have  their  own  domain  model.  The  domain  model  highlights  what  is  considered  important  and 
formally  structures  the  domain.  Such  a  domain  model  is  the  base  of  the  ontology  used  to  deseribe  the  eoneepts 
on  the  domain.  It  is  elear,  however,  that  the  ontology  of  one  eollaborative  partieipant  does  not  always  mateh  the 
ontology  of  another  partieipant.  In  faet,  it  is  highly  unlikely  that  this  is  the  ease,  while  at  the  same  time  it  is 
likely  that  the  ontologies  overlap  to  some  degree.  After  all,  the  partieipants  have  a  desire  to  eommunieate  so  it’s 
likely  they  have  some  overlapping  terms  in  their  ontologies.  We  will  later  refer  to  these  overlapping  terms  as 
anehors. 

In  a  eross  domain  environment  that  intends  to  share  information  between  domains,  seeurity  plays  an 
important  role.  Henee,  not  only  do  domains  have  an  ontology  that  deseribes  their  world,  but  they  also  limitations 
on  how  mueh  of  this  ontology  ean  be  shared.  These  polieies  refer  to  the  ontology  sinee  the  polieies  are  speeified 
over  the  ontology  terms  (the  resourees  in  the  domain).  We  will  not  foeus  on  the  polieies  here,  but  it  is  important 
to  understand  that  there  are  properties  assoeiated  with  the  ontology  terms  and  polieies  that  limit  the  amount  of 
information  that  ean  be  shared. 

So,  eaeh  domain  has  a  domain  ontology  and  seeurity  restrietions  speeified  by  polieies.  This  means  that  any 
data  in  the  domain  is  elassified  aeeording  to  the  ontology,  and  henee  has  aeeess  restrietions  in  plaee. 

The  premise  of  interoperability  is  that  information  (data)  ean  flow  between  domains  and  have  their  key 
properties  reeognized,  while  still  being  able  to  ensure  the  poliey  restrietions.  In  short,  the  premise  of 
interoperability  that  data  requires  metadata  to  be  understood,  and  the  realization  that  there  are  limitations  on 
how  to  translate  the  metadata  eontext  greatly  inereases  the  overall  eomplexity  of  the  prineiple.  An  illustration  of 
the  setup  is  given  in  Figure  1. 


_ Figure  1.  Associated  metadata  (policies)  is  always  attached  to  the  information _ 

Sinee  the  ontologies  differ,  the  seeurity  insuranees  (polieies)  also  do  not  fit  exaetly.  None  the  less,  the  data  sent 
aeross  domain  boundaries  eventually  need  to  be  stored  aeeording  to  the  ontology  of  the  reeeiving  domain,  and 
seeured  by  its  polieies.  Henee,  to  overeome  these  problems,  some  issues  must  be  addressed: 

•  Ontologies  eannot  be  assumed  to  be  fully  shared  or  diselosed  between  domains,  sinee  eaeh  domain  wants 
to  proteet  the  details  of  its  domain  understanding 

•  Even  if  we  ean  identify  an  appropriate  eoneept  in  the  reeeiving  domain  ontology  that  ean  be  used  to 
elassify  a  data  item,  we  eannot  assume  that  the  sending  domain  fully  aeeepts  the  poliey  restrietions  that  are 
plaeed  on  this  resouree  eoneept. 

Even  if  domains  do  not  want  to  fully  diselose  their  ontologies,  they  ean  agree  on  eertain  eoneepts  that  they  share 
and  ean  diselose.  These  eoneepts  would  manually  be  determined  by  human  representatives  of  the  domains.  In 
order  to  be  eonsistent  with  nomenelature  used  in  previous  literature  [8],  we  eall  these  eoneepts  anehors. 

By  having  a  guaranteed  partial  ontology  overlap,  it  is  possible  to  mateh  a  eoneept  in  the  sending  domain 
ontology  with  a  eoneept  in  the  reeeiving  domain  ontology  to  a  suffieient  degree  of  aeeuraey.  Even  if  the  mateh 
is  not  exaet,  the  sending  domain  might  agree  with  the  seeurity  assuranees  and  overall  properties  provided  by  the 
reeeiving  domain  and  may  send  the  data  eonfident.  An  initial  overview  of  the  proeess  is  shown  in  Figure  2. 


In  more  detail,  the  steps  are  the  following: 

1.  Describe  matching  metrics.  The  first  step  is  for  the  sending  domain  to  get  assuranees  that  the  data  that 
eventually  ean  be  sent  with  a  suffieient  matehing  eoneept  in  the  reeeiving  domain.  Henee,  before  any  data  is 
aetually  sent,  the  domains  need  to  negotiate.  The  data  to  be  sent  from  the  sending  domain  has  some  metadata 
assoeiated  with  it.  In  partieular,  it  is  deelared  to  be  an  instanee  of  a  partieular  eoneept  C  in  the  domain  ontology. 
The  sending  domain,  averse  to  diselosing  too  mueh  of  its  ontology,  effeetively  deeides  on  an  appropriate  subset 
of  the  ontology  to  send  over  to  the  reeeiving  domain.  This  subset  is  however  made  anonymous.  By  this  is  meant 
that  all  entity  names,  exeept  the  anehors  (whieh  are  already  shared),  are  removed  (or  ehanged  to  meaningless 
names). 

The  aim  of  the  sending  domain  is  to  provide  enough  information  to  the  reeeiving  domain  sueh  that  a  similar 
eoneept  ean  be  found  in  its  ontology  by  whieh  the  reeeived  information  ean  eventually  be  eategorized  and 
seeured  by  polieies.  It  should  be  noted  that  more  than  a  simple  topology  of  the  seleeted  ontology  subset  is 
eommunieated.  Rather,  what  is  sent  is  a  set  of  metries  that  ean  be  used  for  matehing  against  the  reeeiving 
domain  ontology.  These  metries  are  deseriptions  of  how  the  eoneept  C  relates  to  the  anehors.  This  is  important 
sinee  this  is  the  only  way  for  the  reeeiving  domain  to  be  able  to  find  a  matehing  eoneept  (sinee  they  share 
anehors). 

An  example  of  a  metries  for  eoneept  C  eould  be: 

(IS- A,  1,  “Anehor  Coneept  1”). 

This  metrie  says  that  the  eoneept  C  is  a  subelass  of  the  anehor  “Anehor  Coneept  1”.  The  metries: 

(IS- A,  2,  “Anehor  Coneept  1”) 

for  eoneept  C  means  that  C  is  related  to  “Anehor  Coneept  1”  by  two  IS-A  (or  subelass-of)  relationships.  That  is, 
there  is  some  eoneept  X  that  is  a  subelass  of  “Anehor  Coneept  1”  and  C  is  a  subelass  of  X.  Another  metries  that 
partially  deseribe  the  sending  domain  ontology  ean  also  be  given  as  we  will  see  in  examples  below. 

2.  Match  ontologies.  Onee  a  set  of  metries  <M1,  ...,  Mn>  has  arrived  at  the  reeeiving  domain,  it  tries  to 
determine  whieh  eoneept  in  its  ontology,  if  any,  might  be  a  good  mateh  for  the  data  that  will  arrive.  This  is  done 
by  applying  graph  seareh  algorithms  based  on  the  reeeived  metries.  Eaeh  entity  in  the  domain  ontology  is  given 
a  seore  (value  between  0  and  1)  for  eaeh  metrie.  The  set  of  seores  for  eaeh  entity  is  then  eombined  and 
normalized  into  a  final  value  that  represents  the  eonfidenee  of  it  being  a  good  mateh  (again,  between  0  and  1). 
The  best  k  matehes  are  seleeted  and  eaeh  assoeiated  with  a  key.  The  reason  for  using  keys  is  to  avoid  having  to 
diselose  anything  of  the  domain  ontology  to  the  sending  domain.  Then  k  triples: 

<key,  relevant  properties,  matehing  seore> 

are  sent  to  the  sending  domain.  This  gives  the  sending  domain  a  ehanee  to  piek  a  desired  mateh. 

3.  Metadata  and  selection.  The  sending  domain  ean  now  make  its  deeision  based  on  the  reeeived  response 
triples:  the  property  set  for  a  partieular  mateh  and  its  likeliness  of  being  a  good  mateh.  The  most  likely  seenario 
is  for  the  sending  domain  to  prioritize  on  a  given  property,  but  this  must  not  be  the  ease.  In  some  eases  a  good 
mateh  might  be  preferred  despite  detail  degradation,  while  in  other  eases  a  lesser  ontology  mateh  might  be 
preferred  when  a  given  property  has  a  higher  priority.  Nonetheless,  the  ehoiee  lies  with  the  sending  domain  that 
is  responsible  for  the  data  leaving  its  domain.  Without  aeeeptable  guarantees  given  by  the  reeeiving  domain,  the 
response  ean  also  be  “rejeef ’,  in  whieh  ease  the  data  is  not  sent  at  all.  This  means  that  the  sending  domain  does 
not  want  to  send  the  data  to  that  partieular  reeeiving  domain.  If  the  ehoiee  instead  is  “aeeepf ’,  one  of  the  keys  is 
pieked  and  the  data  is  sent  together  with  the  key.  The  key  is  here  a  representative  of  the  metadata  in  the  sense  of 
“data  and  metadata  are  inseparable.” 

4.  Data  storage.  Onee  the  response  from  the  sending  domain  is  reeeived,  the  reeeiving  domain  ean  elassify 
the  newly  reeeived  data  by  using  the  eorrespondent  eoneept  represented  by  the  key. 
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3.  Example 


In  the  following  we  provide  an  example  that  demonstrates  the  intuition  behind  the  steps  deseribed  above. 

Deseribe  matehing  metries.  The  domain  ontology  for  the  sending  domain  is  shown  in  Figure  3.  It  deseribes 
eoneepts  sueh  as  “Weather  Reading”,  “Wind  Reading”  and  “Loeation”.  Notiee  that  we  do  not  only  have  IS-A 
relationships,  we  also  have  disjoints  (e.g.  “Wind  Reading”  and  “Hourly  Temperature  Reading”  are  disjoint), 
properties  (e.g.  loeation),  domain  and  range  restrietions  (the  domain  and  range  of  property  “loeation”  is 
“Weather  Reading”  and  “Loeation”,  respeetively),  and  property  restrietions  (e.g.  “=  1  loeation”,  whieh  means 
that  an  instanee  of  “Weather  Reading”  is  related  to  exaetly  one  instanee  of  “Loeation”  via  the  property 
“loeation”).  Henee,  we  make  us  of  several  different  kinds  of  ontology  eonstruets  to  model  our  domain. 


Figure  3.  Ontology  on  the  sending  domain. 


Coneepts  with  a  double  border  are  anehors  (pre-agreed  and  shared  eoneepts).  In  this  example  the 
eoneept  “Wind  Reading”  represents  the  metadata  that  is  to  be  sent  along  with  the.  Sinee  we  know  that  the 
reeeiving  domain  ontology  has  the  eoneepts  “Weather  Reading”  and  “Loeation”  (they  are  the  anehors),  our  goal 
is  to  deseribe  enough  about  the  eoneept  “Wind  Reading”  in  relation  to  the  anehors  sueh  that  the  reeeiving 
domain  ean  do  a  reasonable  mateh  onto  its  eoneepts  and  ontology  strueture.  This  deseription,  whieh  we  eall  our 
metries,  eould  for  example  be  the  ones  given  in  Table  1. 


Table  1.  Metries  deseribing  a  partial  ontology 


Metric 

Explanation  J 

(IS-A,  1,  “Weather  Reading”) 

The  metadata  eoneept  is  one  step  removed 
from  anehor  “Weather  Reading”  via  a  IS-A 
relationship.  That  is,  the  metadata  eoneept  is  a 
subelass  of  “Weather  Reading” 

(disjoint,  sibling) 

The  metadata  eoneept  has  a  disjoint  sibling 
via  the  IS-A  relationship.  Notice  that  thanks  to 
the  first  metric  (above),  we  already  know  that 
this  is  in  relation  to  the  “Weather  Reading” 
anchor. 

(domain,  1,  =  1  restrietion, 
range:  “Loeation”) 

The  metadata  eoneept  is  the  domain  of  an 
“exaetly  one  value”  restrieted  property  that  has 
as  range  the  anehor  eoneept  “Loeation” 

(domain,  1,  =  1  restrietion, 
range:  unknown) 

The  metadata  eoneept  is  the  domain  of  an 
“exaetly  one  value”  restrieted  property  that  has 
an  unknown  range  eoneept.  Again,  we  know  that 
this  description  is  in  relation  to  the  “Weather 
Reading  ”  anchor,  and  we  know  that  the  range  is 
not  an  anchor,  because  otherwise  it  could  be 

given. 

It  should  be  noted  that  all  the  deseriptions  of  the  metadata  eoneept  given  in  Table  1  are  in  relation  to  an 
anehor.  This  is  important  sinee  the  anehors  are  the  only  agreed  upon  eoneepts  between  different  domain 
ontologies.  Intuitively,  the  deseriptions  in  Table  1  eorrespond  to  the  partial  ontology  strueture  highlighted  in 
Figure  4. 


It  should  be  noted  that  it  is  not  always  desirable  to  only  mateh  the  topologieal  stmeture  of  ontology  graphs.  It 
ean  also  be  highly  desirable  to  give  larger  weight,  or  preferenee,  to  eertain  kinds  of  relationships.  For  example, 
one  eould  say  that  a  matehing  IS-A  relationship  is  more  important  than  whether  or  not  a  eoneept  is  the  domain 
for  some  speeifie  property.  Henee,  the  relationships  between  the  “nodes”  (entities)  in  the  ontology  stmeture  ean 
play  an  important  role. 

Mateh  ontologies.  At  the  reeeiving  domain,  we  now  assume  that  the  metries,  the  partial  ontology  deseription, 
from  Table  1  has  arrived.  The  task  is  now  to  determine  whieh  eoneept  in  the  loeal  ontology  is  a  likely  mateh  for 
the  metadata  (eoneept)  that  will  be  sent  from  the  sending  domain.  The  loeal  ontology  is  illustrated  in  Figure  5. 
The  ontology  deseribes  similar,  but  different,  terms  eompared  to  the  ontology  in  Figure  3.  That  the  ontology 
partly  overlap  should  be  elear  sinee  they  already  have  agreed  on  some  shared  eoneepts  (the  anehors). 


The  matehing  is  done  by  searehing  the  loeal  ontology  graph  stmeture  and  assigning  seores  to  nodes  for  eaeh 
of  the  metries.  An  example  of  seores  given  to  only  some  of  the  eoneepts  in  the  ontology  is  shown  in  Figure  6. 


1  =  0.75 
7.  -  0.75 

3.  =  0.75 

4.  -  030 


1  -  0.55 
7  =075 
i.  =0./5 
4.  =  0.75 


1  =  0.75 
7  -  0.75 
3.  =  0.75 
4  -  0.30 


1  -  0.55 
7  =075 
i  =0./5 
4  =  0.55 


Figure  6.  Assigning  scores  to  concepts  in  the  receiving  domain  ontology. 


The  orders  of  the  seores  eorrespond  to  the  deseriptions  in  Table  1.  For  example,  the  first  metrie  in  Table  1 
states  that  the  metadata  eoneept  is  a  direet  subelass  of  the  anehor  “Weather  Reading”.  Both  the  eoneepts 
“Coastal”  and  “Inland”  mateh  this  deseription  and  get  a  high  seore.  The  eoneepts  “Coastal  Wind”  and  “Daily 
Temperature  Reading”  are  not  exaet  matehes,  but  elose,  and  reeeive  a  lower  seore.  These  ealeulations  are  done 
for  eaeh  of  the  metries  that  are  sent  from  the  sending  domain.  When  we  add  the  seores  together  we  get 
something  like  what  is  shown  in  Table  2.  These  seores  ean  then  be  normalized,  but  this  is  left  out  here. 


Table  2.  Scores  for  metadata  concepts. 


Key 

Metadata  Concept 

Score 

keyl 

Coastal 

0.75  *  0.75  *  0.75  *  .30  =  0.1265625 

key2 

Inland 

0.75  *  0.75  *  0.75  *  .30  =  0.1265625 

key3 

Coastal  Wind 

0.75  *  0.75  *  0.75  *  .55  =  0.2320312 

key4 

Daily  Temperature  Reading 

0.55  *  0.75  *  0.75  *  .55  =  0.1701562 

A  eut-off  value  ean  be  seleeted  to  limit  the  number  of  ehoiees  sent  baek  to  the  sending  domain.  For  example, 
0.16  might  be  ehosen  for  the  above  example.  Henee,  the  domain  would  send  baek  the  following  ehoiees: 

<key3,  “propertyl”,  0.23> 

<key4,  “property2”,  0.1 7> 

Here  we  have  assumed  that  there  is  some  understanding  of  what  the  properties  deseription  mean,  whieh  eould 
be  more  elaborate  and  must  be  pre-agreed  between  the  domains  along  with  the  anehor  eoneepts. 

Metadata  and  selection.  The  sending  domain  looks  at  the  options,  deeides  that  the  “propertyl”  is  equivalent 
on  its  original  eoneept  for  the  data  to  be  sent,  and  deeides  to  go  with  what  is  eonsidered  the  best  mateh.  Henee, 
the  sending  domain  sends  the  data  together  with  “key3.” 

Data  storage.  The  response  is  handled  in  the  reeeiving  domain  by  properly  eategorizing  the  data  and  henee 
proteeting  the  data  aeeording  to  the  poliey.  In  this  example,  the  reeeived  data  would  be  tagged  with  the  metadata 
“Coastal  Wind”. 

Overview  Summary 

The  example  above  has  been  used  to  demonstrate  how  ontology  matehing  ean  be  used  to  faeilitate  eross 
domain  seeurity  eommunieation.  The  approaeh  is  based  on  the  knowledge  that  eaeh  domain  has  a  domain 
ontology  that  aets  as  its  data  model.  Polieies  are  speeified  with  respeet  to  this  data/domain  model,  whieh  are 
used  to  guarantee  the  seeurity  of  the  underlying  data. 

For  domains  to  share  data  (seeure  eross  domain  information  sharing)  they  first  need  to  negotiate  the  terms  for 
sharing  the  data.  This  is  done  by  matehing  their  ontologies,  but  without  the  requirement  to  fully  diselose  their 
ontologies.  To  be  able  to  do  this,  the  domains  have  already  agreed  to  eertain  eommon  eoneepts.  These  fully 
diselosed  and  shared  eoneepts  are  referred  to  as  “anehors.” 

The  domain  whieh  is  to  reeeive  the  data,  tries  to  find  a  good  mateh  in  its  ontology  and  sends  some 
alternatives  baek  to  the  sending  domain,  ineluding  information  about  the  properties  assoeiated  with  those 
ontology  matehes.  The  sending  domain  then  has  the  opportunity  to  deeide  what  to  do,  and  under  whieh  terms  to 
send  the  data.  Onee  the  data  is  sent,  the  proper  matehing  ean  be  enforeed  in  the  reeeiving  domain,  as  agreed  to 
by  the  sending  domain. 

The  erueial  point  here  is  to  investigate  good  metries  for  deseribing  useful  ontology  strueture  with  respeet  to 
agreed  upon  ontology  anehors.  Further  to  deviee  a  sueeessful  matehing  teehnique  that  properly  ean  be  evaluated 
and  demonstrated  to  give  good  results.  We  outline  our  initial  approaeh  to  this  matehing  proeess  below,  but  more 
work  and  evaluation  is  needed. 
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4.  Secure  Ontology  Matching  Algorithm 

In  this  section,  we  describe  our  approach  to  perform  graph-based  matching  of  ontologies.  Unlike  traditional 
ontology  matching  which  matches  the  entities  between  different  ontologies,  our  problem  is  to  find  the  best 
match  (i.e.,  an  entity)  in  the  receiving  ontology  for  a  given  metadata  in  the  sending  ontology.  Also,  only  graph- 
based  matching  (or  structural  matching)  is  allowed  in  our  problem,  because  the  descriptions  of  the  entities  in  the 
ontology  are  not  to  be  shared. 


Architecture  Overview 

Figure  7  shows  the  architecture  for  ontology  matching.  On  each  side,  there  is  a  domain  ontology  (or  ontology 
block).  Also,  there  are  anchors  that  are  predefined  by  humans.  Anchors  are  alignments  with  high  similarities. 
They  are  necessary  for  finding  matches  of  the  metadata.  The  red  arrows  show  the  information  flow.  When  the 
sending  side  receives  a  metadata,  it  will  construct  matrices  or  a  set  of  descriptions  describing  the  relationships 
between  the  metadata  and  anchors.  The  matrices  are  then  sent  to  the  receiving  side,  which  is  responsible  for 
finding  a  list  of  candidates.  Since  the  contents  of  the  candidates  are  not  allowed  to  be  shared,  the  receiving  side 
will  only  send  a  list  of  candidate  keys  with  their  properties  to  the  sending  side.  Then  the  sending  side  will  pick  a 
candidate  and  send  back  its  choice.  Finally  the  receiving  side  will  attach  the  selected  entity  (i.e.,  metadata)  to  the 
data  and  forward  it. 

The  major  processing  steps  on  the  sending  side  include: 

1.  If  the  ontology  is  a  large  ontology  with  more  than  1000  entities,  partition  it  into  blocks  of  RDF  triples. 
The  divide-and-conquer  method  described  in  [8]  will  be  used  for  partitioning  here. 

2.  Given  a  metadata  M,  retrieve  the  ontology  block  and  construct  a  graph  G  for  the  block. 

3.  Apply  a  depth-first  search  algorithm  to  construct  the  matrices  or  a  set  of  descriptions  describing  the 
position  of  the  metadata  with  respect  to  the  anchors  in  the  block.  (Note:  refer  to  the  algorithm  design 
part  for  the  details  of  the  depth- first  search  algorithm) 

4.  Send  the  metrics  to  the  receiving  side. 


The  major  processing  steps  on  the  receiving  side  include: 

1 .  If  the  ontology  is  a  large  ontology  with  more  than  1000  entities,  partition  it  into  blocks  in  the  same  way 
as  on  the  sending  side. 

2.  For  each  metric  obtained  from  the  sending  side,  extract  the  information  about  anchor.  Retrieve  the 
ontology  blocks  that  contain  the  anchor,  and  construct  graphs  for  the  blocks. 

3.  For  each  metric,  apply  a  width-first  search  algorithm  to  compute  the  scores  of  candidate  entities  that 
could  potentially  match  the  metadata.  Note  that  this  search  is  done  in  every  retrieved  ontology  block. 
(Note:  refer  to  the  algorithm  design  part  for  the  details  of  the  width-first  search  algorithm) 

4.  Compute  the  weighted  sums  of  scores  for  the  candidate  entities.  Rank  the  candidates  and  attach 
information  about  key  and  security  level. 

5.  Send  the  ranked  list  of  candidate  entities  to  the  sending  side. 

Search-based  Matching  Algorithm 

In  this  section,  we  describe  the  algorithm  design  for  the  matching  process.  On  the  sending  side,  we  will 
design  a  depth-first  search  algorithm  to  construct  the  metrics.  On  the  receiving  side,  we  will  design  a  width- first 
search  algorithm  to  generate  a  list  of  candidate  entities. 

Error!  Reference  source  not  found,  shows  the  pseudo  code  for  constructing  the  matrices  on  the  sending 
side.  This  depth- first  search  algorithm  takes  the  ontology  graph  G  and  the  metadata  M  (a  vertex  of  G)  as  input. 
It  assumes  that  there  is  a  list  of  anchors  in  the  ontology  (Line  3).  It  initializes  a  tree  T  to  the  starting  vertex,  and 
a  list  L  which  stores  the  edges  that  are  visited  (Line  4,  5).  In  the  search  function  Search(vertex  v),  the  algorithm 
first  marks  the  vertex  as  visited  and  checks  if  the  vertex  is  an  anchor  or  not.  If  the  vertex  is  an  anchor,  the  search 
stops  (Line  21,  22,  23).  If  a  vertex  v  has  several  unmarked  neighbors,  the  edges  between  the  vertex  and  the 
neighbors  will  be  appended  to  the  list  L  (Line  24,  25).  Note  that  it  would  be  equally  correct  to  visit  the 
neighbors  in  any  order.  The  easiest  method  to  implement  would  be  to  visit  the  neighbors  in  the  order  they  are 
stored  in  the  adjacency  list  for  v.  As  a  depth  first  search  algorithm,  it  removes  edges  from  end  of  list  L  so  that 
the  list  acts  as  a  stack  rather  than  a  queue  (Line  10).  Also,  each  vertex  is  clearly  marked  at  most  once,  each  edge 
is  added  to  the  list  L  at  most  once,  and  therefore  removed  from  the  list  at  most  once.  The  spanning  tree  T 
constructed  by  the  algorithm  is  a  depth  first  search  tree,  where  the  leaf  nodes  contain  the  anchors.  In  T,  a  path 
from  the  root  (i.e.,  metadata  M)  to  a  leaf  node  (i.e.,  an  anchor)  describes  the  position  of  the  metadata  with 
respect  to  the  anchor.  The  easiest  method  to  describe  the  position  would  be  to  count  the  steps  from  the  metadata 
to  the  anchor. 

L  DFS  (G,  M)  G  is  the  ontology  graph,  M  is  the  metadata 

2.  { 

3.  List  A  =  set  of  anchors  in  G 

4.  List  L  =  empty 

5.  Tree  T  =  empty 

6.  Choose  M  as  the  starting  vertex 

7.  Search  (M) 

8.  While  (L  not  empty) 

9.  { 

10.  Remove  edge  (v,  w)  from  end  of  L 

IL  If  w  not  yet  visited 

12.  { 

13.  Add  edge  (v,  w)  to  T 

14.  Search  (w) 

15.  } 

16.  } 

17.  } 

18. 

19.  Search  (vertex  v) 

20.  { 

21.  Mark  v  as  visited 

22.  If  V  is  an  anchor  in  A 

23.  return 

24.  For  each  edge  (v,  w) 

25.  Add  edge  (v,  w)  to  end  of  L 

26.  } 
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The  depth-first  seareh  algorithm  is  mainly  for  deseribing  the  metadata’s  position  with  respeet  to  the  anehors. 
When  we  build  metries  deseribing  the  relative  position  between  the  metadata  and  non-anehor  entities,  we  will 
still  apply  this  algorithm  to  referenee  the  metadata’s  position  with  respeet  to  anehors.  This  is  beeause  the 
reasoning  on  the  reeeiving  side  requires  the  information  about  anehors. 

Error!  Reference  source  not  found,  shows  the  pseudo  eode  for  matehing  on  the  reeeiving  side.  This  width- 
first  seareh  algorithm  takes  the  ontology  graph  G  and  the  anehor  (vertex  of  G)  as  input.  It  initializes  a  tree  T  to 
the  starting  vertex,  and  a  list  L  whieh  stores  the  edges  that  are  visited  (Line  3,  4).  In  the  seareh  funetion 
Search(vertex  v),  the  algorithm  first  marks  the  vertex  as  visited  and  assigns  a  seore  based  on  the  metrie 
deseription.  If  the  seore  is  smaller  than  a  eritieal  value,  the  seareh  stops  (Line  20,  21,  22,  23).  If  a  vertex  v  has 
several  unmarked  neighbors,  the  edges  between  the  vertex  and  the  neighbors  will  be  appended  to  the  list  L  (Line 
24,  25).  As  a  width  first  seareh  algorithm,  it  removes  edges  from  start  of  list  L  so  that  the  list  aets  as  a  queue 
rather  than  a  staek  (Line  9).  Also,  eaeh  vertex  is  elearly  marked  at  most  onee,  eaeh  edge  is  added  to  the  list  L  at 
most  onee,  and  therefore  removed  from  the  list  at  most  onee.  The  tree  T  eonstrueted  by  the  algorithm  is  a  width 
first  seareh  tree  of  the  vertiees  reaehed  during  the  seareh.  These  vertiees  represent  the  eandidates  that  eould 
potentially  mateh  the  metadata. 

1.  WFS  (G,  A)  G  is  the  ontology  graph,  A  is  the  anchor 

2.  { 

3.  List  L  =  empty 

4.  Tree  T  =  empty 

5.  Choose  A  as  the  starting  vertex 

6.  Search  (A) 

7.  While  (L  not  empty) 

8.  { 

9.  Remove  edge  (v,  w)  from  start  of  L 

10.  If  w  not  yet  visited 

IL  { 

12.  Add  edge  (v,  w)  to  T 

13.  Search  (w) 

14.  } 

15.  } 

16.  } 

17. 

18.  Search  (vertex  v) 

19.  { 

20.  Mark  v  as  visited 

21.  Assign  a  score  S  based  on  the  metric  description 

22.  If  the  score  S  is  smaller  than  a  critical  value 

23.  return 

24.  For  each  edge  (v,  w) 

25.  Add  edge  (v,  w)  to  end  of  L 

26.  } 


We  recognize  that  how  to  assign  score  to  a  vertex  is  a  research  issue  that  needs  further  investigation.  Figure 
8  illustrates  a  simple  example  to  generate  the  score  for  a  vertex.  The  basic  idea  is  to  define  a  scoring  function 


based  on  the  metrie.  Aeeording  to  the  seoring  funetion,  eaeh  vertex  will  get  a  seore  based  on  its  relative  position 
to  the  anehor.  The  vertex  that  perfeetly  matehes  the  metrie  gets  a  highest  seore;  the  next  nearest  vertex  gets  a 
smaller  seore,  and  so  on.  We  will  investigate  various  ways  to  define  the  seoring  funetion. 
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5.  Conclusions  and  Future  Research 

Our  initial  results  indicate  that  very  consistent  matches  can  be  achieved  using  the  techniques  describe  in  this 
paper.  The  quality  of  the  matches  between  Ontologies  is  highly  dependent  on  the  choice  of  anchors,  and  the  set 
of  relationships  (IS- A,  etc. . .)  and  properties  supported. 

There  are  several  points  that  require  further  research.  Here  we  only  list  a  few  of  them  that  we  consider  to  be 
important: 

•  It  is  unclear  how  good  anchor  concepts  are  selected.  There  are  certain  requirements,  such  as  them  being 
sharable  and  general  enough  to  facilitate  successful  ontology  matching.  What  makes  a  good  anchor  is 
however  unclear  and  will  require  evaluation  of  real  world  scenarios.  A  methodology  to  select  good 
anchor  points  is  needed. 

•  Once  a  data  item  has  been  successfully  negotiated  and  transferred  from  domain  A  to  domain  B,  it  is 
important  that  the  same  data  item  can  cross  back,  without  being  incorrectly  classified.  That  is,  a  correct 
reverse  operation  must  be  guaranteed.  The  criteria  for  this  operation  must  be  defined,  understood,  and 
the  ontology  matching  algorithms  must  guarantee  this  property. 

•  It  is  important  that  higher  priority  can  be  given  to  certain  relationships.  That  is,  in  some  cases  it  might 
be  very  important  for  the  sending  domain  to  ensure  that  a  data  item’s  metadata  is  matched  against  some 
concept  in  the  receiving  ontology  that  is  closely  related  to  some  anchor  A  via  the  IS- A  relationship.  In 
contrast  to,  for  example,  the  matching  concept  being  the  domain  for  a  certain  property.  Hence,  it  should 
be  possible  for  the  sending  domain  to  give  weight  to  certain  metrics  that  is  sent  to  the  receiving 
domain.  The  receiving  domain  must  of  course  take  this  into  consideration.  How  this  weight  is  properly 
respected  in  the  graph  matching  algorithm  needs  to  be  clarified. 
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